skip to main content


Search for: All records

Creators/Authors contains: "Jagadish, H. V."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Data-driven algorithms are only as good as the data they work with, while datasets, especially social data, often fail to represent minorities adequately. Representation Bias in data can happen due to various reasons, ranging from historical discrimination to selection and sampling biases in the data acquisition and preparation methods. Given that “bias in, bias out,” one cannot expect AI-based solutions to have equitable outcomes for societal applications, without addressing issues such as representation bias. While there has been extensive study of fairness in machine learning models, including several review papers, bias in the data has been less studied. This article reviews the literature on identifying and resolving representation bias as a feature of a dataset, independent of how consumed later. The scope of this survey is bounded to structured (tabular) and unstructured (e.g., image, text, graph) data. It presents taxonomies to categorize the studied techniques based on multiple design dimensions and provides a side-by-side comparison of their properties. There is still a long way to fully address representation bias issues in data. The authors hope that this survey motivates researchers to approach these challenges in the future by observing existing work within their respective domains. 
    more » « less
    Free, publicly-accessible full text available December 31, 2024
  2. Diversity, group representation, and similar needs often apply to query results, which in turn require constraints on the sizes of various subgroups in the result set. Traditional relational queries only specify conditions as part of the query predicate(s), and do not support such restrictions on the output. In this paper, we study the problem of modifying queries to have the result satisfy constraints on the sizes of multiple subgroups in it. This problem, in the worst case, cannot be solved in polynomial time. Yet, with the help of provenance annotation, we are able to develop a query refinement method that works quite efficiently, as we demonstrate through extensive experiments.

     
    more » « less
    Free, publicly-accessible full text available October 1, 2024
  3. Relational queries are commonly used to support decision making in critical domains like hiring and college admissions. For example, a college admissions officer may need to select a subset of the applicants for in-person interviews, who individually meet the qualification requirements (e.g., have a sufficiently high GPA) and are collectively demographically diverse (e.g., include a sufficient number of candidates of each gender and of each race). However, traditional relational queries only support selection conditions checked against each input tuple, and they do not support diversity conditions checked against multiple, possibly overlapping, groups of output tuples. To address this shortcoming, we present Erica, an interactive system that proposes minimal modifications for selection queries to have them satisfy constraints on the cardinalities of multiple groups in the result. We demonstrate the effectiveness of Erica using several real-life datasets and diversity requirements.

     
    more » « less
    Free, publicly-accessible full text available August 1, 2024
  4. Free, publicly-accessible full text available June 4, 2024
  5. Data-centric methods designed to increase end-to-end reliability of data-driven decision systems.

     
    more » « less
  6. Perspectives on the role and responsibility of the data-management research community in designing, developing, using, and overseeing automated decision systems. 
    more » « less